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Abstract 

We  address  recent  criticisms  of  evidential  reasoning,  an  approach  to  the  analysis  of  imprecise 
and  uncertain  information  that  is  based  on  the  Dempster-Shafer  calculus  of  evidence. 

We  show  that  evidential  reasoning  can  be  interpreted  in  terms  of  classical  probability 
theory  and  that  the  Dempster-Shafer  calculus  of  evidence  may  be  considered  to  be  a  form 
of  generalized  probabilistic  reasoning  based  on  the  representation  of  probabihstic  ignorance 
by  intervals  of  possible  values.  In  particular,  we  emphasize  that  it  is  not  necessary  to  resort 
to  nonprobabilistic  or  subjectivist  explanations  to  justify  the  validity  of  the  approach. 

We  answer  conceptual  criticisms  of  evidential  reasoning  primarily  on  the  basis  of  the 
criticism’s  confusion  between  the  current  state  of  development  of  the  theory  —  mainly  theo¬ 
retical  limitations  in  the  treatment  of  conditional  information —  and  its  potential  usefulness 
in  treating  a  wide  variety  of  uncertainty-analysis  problems.  Similarly,  we  indicate  that  the 
supposed  lack  of  decision-support  schemes  of  generalized  probability  approaches  is  not  a 
theoretical  handicap  but,  rather,  an  indication  of  basic  informational  shortcomings  that  is 
a  desirable  asset  of  any  formal  approximate  reasoning  approach.  We  also  point  to  potential 
shortcomings  of  the  underlying  representation  scheme  to  treat  general  probabilistic  reasoning 
problems. 

We  also  consider  methodological  criticisms  of  the  approach,  focusing  primarily  on  the 
alleged  counterintuitive  nature  of  Dempster’s  combination  formula,  showing  that  such  results 
are  the  result  of  its  misapplication.  We  also  address  issues  of  complexity  and  validity  of  scope 
of  the  calculus  of  evidence. 


1  Introduction 


If  artificially  intelligent  systems  are  to  produce  adequate  eissessments  of  the  state  and  behav¬ 
ior  of  the  real  world,  they  must  cope  with  information  and  knowledge  that  is  characterized 
by  varying  degrees  of  uncertainty,  ignorance,  and  correctness.  To  address  this  need,  we  have 
developed  a  technology  called  evidential  reasoning.  It  is  formally  based  upon  the  Dempster- 
Shafer  [18]  theory  of  belief  functions;  it  has  been  implemented  as  a  domain-independent 
automated  reasoning  system;  and  it  has  been  successfully  apphed  to  a  range  of  real-world 
problems  [11].  Yet,  its  reliance  on  belief  functions  has  drawn  criticism. 

Our  choice  of  an  approach  based  on  the  Dempster-Shafer  theory  was  not  arbitrary.  We 
believe  that  that  theory  confers  important  methodological  advantages,  such  as  its  ability 
to  represent  ignorance  in  a  direct  and  straightforward  fashion,  its  consistency  with  classical 
probability  theory,  its  compatibility  with  Boolean  logic,  and  its  manageable  computational 
complexity.  At  the  same  time,  we  recognize  that  other  approaches  may  also  complement 
and  augment  the  cissessments  provided  by  evidential  reasoning. 

We  examine  several  criticisms  of  belief  functions  that  have  appeared  in  the  literature, 
discussing  first  the  fundamental  theoretical  bases  supporting  the  belief-function  approach  and 
justifying  its  use  in  terms  of  the  requirements  imposed  by  ignorance  of  certain  probability 
distributions.  We  consider  the  nature  of  Dempster’s  rule  of  combination  and  argue  that 
negative  assessments  either  misinterpret  the  nature  of  the  distributions  being  combined  or 
ignore  the  basic  independence  assumptions  that  assure  its  validity.  We  stress  also  that  it  is 
not  necessary  to  rely  on  explanations  that  are  either  nonprobabillstic  or  subjective  to  justify 
the  validity  of  the  Dempster-Shafer  calculus  of  evidence. 

Furthermore,  we  show  that  certain  apparently  counterintuitive  properties  of  the  approach 
(e.g.,  the  “spoiled  sandwich”  paradox)  are  the  natural  consequence  of  considering  families 
of  possible  probability  distributions  that  solve  an  approximate  reasoning  problem.  In  the 
context  of  this  discussion,  we  indicate  also  the  inherent  pitfalls  of  “axiomatic”  approaches 
that  accept  or  reject  methodologies  on  the  basis  of  their  compliance  with  allegedly  intuitive 
principles. 

We  also  answer  critiques  based  on  the  computational  complexity  of  the  belief-function 
approach.  Such  criticisms  claim  that  the  complexity  of  probabilistic  knowledge  representa¬ 
tions  grows  exponentially  with  the  size  of  the  frame,  thus  making  the  theory  unsuited  for 
automated  reasoning.  Other  comments  addressed  in  our  presentation  center  on  limitations 
on  the  representational  ability  of  belief  functions  and  the  lack  of  certain  methodological 
capabilities  (e.g.,  decision-making  mechanisms). 

Despite  the  criticism  that  belief  functions  have  drawn,  we  believe  that  evidential  reasoning 
is  well-founded  and  that  it  may  be  effectively  applied  to  the  solution  of  a  broad  range  of 
important  practical  problems. 

Most  of  our  comments  will  be  made  in  direct  reply  to  a  recent  criticism  of  the  belief- 
function  approach  by  Pearl  [15],  because  we  feel  that  his  paper  encompasses  most  of  the 
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major  worries  and  concerns  expressed  about  the  calculus  of  evidence.  While  most  of  the 
discussion  of  this  paper  consists  of  direct  responses  to  issues  raised  by  Pearl  and  others, 
our  overall  objective  is  considerably  broader.  Our  answers  are  motivated  by  the  remarks 
of  DeGroot,  quoted  by  Pearl  at  the  conclusion  of  his  work,  about  the  need  to  use  our 
methodological  approaches  “. . .  with  the  utmost  care  and  in  accordance  with  the  highest 
ethical  standards.”  Our  aim,  like  Pearl’s,  is  to  enlighten  and  clarify,  through  careful  discussion 
of  rather  subtle  and  delicate  issues,  rather  than  to  engage  in  dogmatic  defense  of  one  approach 
to  the  detriment  of  another.  It  is  our  earnest  hope  that  this  work,  in  conjunction  with 
other  evaluations  of  the  belief-function  approach,  will  lead  to  a  better  understanding  of  its 
foundations,  capabilities,  and  limitations. 

2  On  Theoretical  Soundness 

The  theory  of  belief  functions  was  originated  by  Dempster  [4]  in  the  context  of  statistical 
research.  The  use  of  the  term  “belief,”  together  with  its  subjectivist  connotations,  is  due  to 
Shafer  [18],  who  first  applied  the  theory  to  the  analysis  of  imprecise  and  uncertain  evidence. 

Although  much  skepticism  has  been  voiced  about  the  naturality  of  belief  functions  and 
their  agreement  with  conventional  probabilistic  approaches,  its  theoretical  bases  are  provided 
by  a  simple  consideration  of  the  role  of  evidence  as  a  basic  information  carrier. 

In  classical  probabilistic  treatments,  it  is  assumed  that,  under  certain  evidential  con¬ 
ditions  the  value  P(p|^)  of  the  likelihood  of  a  particular  statement  p  is  known.  This 
view  of  evidence,  while  adequate  to  represent  the  informational  conditions  of  most  controlled 
experimental  setups,  fails,  however,  to  adequately  model  the  effects  that  acquiring  similar 
information  has  on  our  state  of  knowledge  when  the  state  of  the  world  can  not  be  so  readily 
manipulated. 

In  such  circumstances,  whenever  the  evidence  y  is  observed,  three  possible  informational 
outcomes  may  result  from  examination  of  further  information  that  later  turns  out  to  improve 
our  state  of  knowledge:  either  p  is  found  to  be  true,  -ip  is  found  to  be  true  (i.e.,  p  is  false), 
or  such  information  is  insufficient  to  determine  the  truth  value  of  p.  Use  of  modal  logic 
concepts,  which  are  the  bases  of  the  formal  model  of  Ruspini[17],  suggests  the  use  of  the 
notation  Kp,  K-ip,  and  Ip  to  identify  these  outcomes.  Since  these  alternatives  are  exclusive, 
it  is  clear  that 

P(Kp) -I- P(K-p) -I- P(Ip)  =  1 . 

Furthermore,  since  the  probability  of  Ip  may  be  positive,  it  will  be  true,  in  general,  that 

P(Kp)-bP(K-.p)  <1. 

^Throughout  this  paper,  the  symbol  y  is  used  to  denote  available  evidence,  i.e.,  a  collection  of  propo¬ 
sitions  about  the  real  world  that  are  known  to  be  true  either  as  the  result  of  direct  observation  or  as  the 
consequences  of  applicable  background  knowledge. 
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This  model,  based  on  a  combination  of  classical  probability  methods  and  the  modal 
logic  S5  [8,12],  essentially  provides — through  the  logical  notion  of  possible  world — a  meaning 
for  the  unary  operator  K  as  the  representation  of  the  state  of  knowledge  of  a  statistician 
who  is  estimating  the  probability  of  truth  of  diverse  propositions  {p,  under  evidential 

conditions  8’. 

This  statistician  estimates  those  distributions  by  considering  multiple  samples  of  the 
state  or  behavior  of  a  real-world  system.  Using,  for  each  sample,  additional  information 
collected  through  further  experimentation,  the  statistician  may  then  establish  or  not  the 
validity  of  a  proposition  p.  If  he  is  rather  lucky,  our  statistician  will  find  himself  in  the  ideal 
situation  where  he  can  actually  “know”^  or  “prove”  that  the  real  world  is  in  a  state  s  that 
is  described  to  the  best  level  of  detail  that  is  necessary  to  understand  its  behavior  (i.e.,  a 
“possible  world”).  This  is  the  state  of  knowledge  usually  attained,  under  perfect  laboratory 
conditions,  when  experimental  samples  are  fully  analyzed  and  when  the  outcome  of  such 
analyses  is  classified  in  terms  of  a  set  of  exhaustive  and  mutually  exclusive  alternatives. 

Under  less  desirable  epistemological  circumstances,  however,  the  statistician  will  only  be 
able  to  prove  that  a  less  specific  proposition  q  is  true.  In  the  extreme  case  where  no  further 
information  exists,  he  will  be  forced  to  say  that  his  knowledge  is  limited  to  that  provided  by 
the  evidence  8’,  or  that  it  is  “vacuous.” 

All  samples  so  analyzed,  however,  can  be  classified  as  to  the  “most  specific  knowledge” 
that  could  be  determined  in  each  case.  The  corresponding  probability  measure  of  the  set 
e(p)  of  samples  where  the  proposition  p  was  the  most  specific  knowledge  (called  an  epistemic 
set  by  Ruspini)  corresponds,  in  Shafer’s  framework,  to  the  value  m(p)  of  a  mass  function  m, 
i.e., 

m{p)  =P(e(p)). 

Correspondingly,  the  probability  that  p  was  “known”  to  be  true  during  statistical  experi¬ 
mentation,  corresponds  to  the  value  Bel(p)  of  Shafer’s  belief  function,  i.e., 

Bel(p)  =  P(Kp). 

The  connection  between  the  ability  of  our  statistician  to  know  that  p  was  true  and  the 
belief  and  mass  functions  that  he  estimates  through  experimentation  justifies  both  the  expres¬ 
sion  epistemic  probability  intvoduced  by  Ruspini  [17]  to  describe  the  underlying  probabilities 
defined  over  a  particular  set  of  situations  or  scenarios  Kp  (called  the  epistemic  universe)^ 
and  the  description  of  the  functions  as  being  “probabilities  of  provability”  or  “probabilities 
of  necessity”  by  Pearl  [14],  following  a  suggestion  by  Fagin  and  Halpern  [6]. 

In  short,  all  such  interpretations  are  equivalent  to  the  original  model  of  Ruspini,  where 
a  rational  agent  was  able  to  prove  the  truth  of  different  propositions  under  different  infor- 

^Note  that,  in  the  context  of  epistemic  logics  such  as  S5,  the  operator  K  behaves  as  a  logical  necessity 
operator.  “Knowing”  a  proposition  simply  means  that  observations  logically  imply  such  proposition,  or  that 
it  is  necessarily  true. 
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mational  circumstances  that  were  found  to  prevail,  during  his  statistical  experiment,  with 
dijfferent  frequencies  of  occurrence.^ 

Since  the  abihty  to  prove  a  proposition  q  entails  the  ability  to  prove  any  proposition  p 
that  is  implied  by  q,  it  should  be  clear  that 

Bel(p)  =  ^  Tn{q) , 

9=>P 

which  is  the  fundamental  equation  relating  the  basic  structures  of  the  calculus  of  evidence. 
It  is  also  true  that 

Bel(p)  <  P(p)  <  1  —  Bel(-ip) , 

providing  bounds  for  the  probability  of  p  that  may  not  be  improved.  This  ability  to  manip¬ 
ulate  probabiHty  intervals  by  means  of  the  compact  representation  scheme  of  mass  functions 
is  the  major  reason  for  the  appeal  of  the  Dempster-Shafer  methodology. 

While  the  above  discussion  clarifies  the  nature  of  the  statistician’s  knowledge  modeled 
by  belief  and  mass  functions,  doubts  might  still  remain  as  to  their  utility  to  those  who  were 
not  involved  in  their  statistical  estimation  process.  Such  usage  is,  however,  that  made  of 
any  other  probabilistic  information.  The  analyst  who  observes  8”  does  not  have  the  luxury 
that  was  available  to  the  statistician  estimating  epistemic  probabilities,  i.e.,  the  ability  to 
collect  additional  information  that  permits  a  more  detailed  characterization  of  the  state  of 
the  world,  for  the  same  reasons  that  the  user  of  statistical  tables  is  unable  to  utilize  the 
raw  data  of  the  estimating  statistician.  Under  such  circumstances,  the  analyst  is  forced  to 
rely  on  the  probabilistic  estimates  provided  by  the  statistician,  which  are  believed  on  the 
basis  of  the  assumed  regularity  of  the  repetitive  behavior  of  the  system;  the  epistemological 
cornerstone  of  probabilistic  reasoning. 

In  other  words,  the  “probability  of  provability”  is  the  best  information  that  is  available  to 
the  analyst;  an  observation  that  not  only  disposes  of  questions  about  its  role  in  probabilistic 
reasoning,  but  also  of  Pearl’s  worries  about  its  use  in  lieu  of  the  obviously  more  desirable 
“probability  of  truth”  [15]: 

“why  we  should  concern  ourselves  with  the  probability  that  the  evidence  implies  A, 
rather  than  the  probability  that  A  is  true,  given  the  evidence?”. 

Clearly,  we  would  prefer  having  the  latter,  but,  unfortunately,  we  can  only  measure  the 
former. 

^Note,  however,  that  while  use  of  the  terms  “knowability,”  “provability,”  and  “necessity”  does  much  to 
provide  adequate  semantics  to  the  calculus  of  evidence,  its  loose  usage  leads  to  unnecessary  confusion.  For 
example,  in  his  recent  criticism  [15],  Pearl  takes  some  questionable  semantic  license  with  the  term  “necessity,” 
mentioning,  for  example,  the  probability  that  a  decision  “will  have  to  made  out  of  compelling  necessity.”  Such 
“pragmatic”  necessity  does  not  have  anything  to  do,  of  course,  with  the  “logical  necessity”  that  underlies 
the  Dempster-Shafer  theory,  i.e.,  the  necessary  truth  of  a  proposition  given  available  evidence. 
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Our  interpretation  of  the  major  evidential  functions  and  structures  also  quickly  disposes 
of  erroneous  arguments  based  on  unintended  interpretations  of  the  intervals  defined  by  be¬ 
lief  functions.  Each  such  interval  represents  ignorance  of  a  single  probability  value  for  a 
proposition  p  under  fixed  evidential  conditions  ^ .  If  critics  choose,  for  example,  to  interpret 
such  intervals  as  the  possible  values  that  conditional  probabilities  might  attain  when  further 
evidence  is  collected,  as  suggested  by  Pearl  [13],  behef  functions  will  not,  indeed,  behave 
according  to  such  unintended  semantics. 

In  closing  this  section,  it  is  important  to  mention  other  alternative  views  of  the  structures 
of  the  calculus  of  evidence  such  as  that  recently  proposed  by  Smets  [19],  which  are  based  on  a 
nonprobabilistic  concept  of  belief.  Although  those  models  are  interesting  on  the  strength  of 
their  own  virtues,  we  still  empha.size  that  such  interpretations  are  not  required  to  reconcile 
the  calculus  of  evidence  with  conventional  probability  theory. 

In  consideration  of  our  abihty  to  reconcile  all  structures  and  formulcis  of  the  calculus  of 
evidence,  including  the  Dempster’s  formula,  with  conventional  probability  structures,  such 
as  inner  and  outer  probabilities,  we  do  not  feel  strongly  compelled  to  accept  alternative  epis- 
temic  interpretations.  Our  skepticism  in  this  regard  is  further  supported  by  the  observation 
that,  often,  such  epistemological  alternatives  are  the  result  of  misunderstandings  about  the 
role  of  certain  evidential  formulas  and  processes  (e.g.,  normalization).  For  the  same  reasons, 
we  remain  unconvinced  about  the  need  to  assign  alternative  interpretations  to  the  structures 
of  calculus  of  evidence  or  to  its  functions,  a.s  is  the  recent  suggestion  of  Halpern  and  Fagin  [7], 
which  is  echoed  by  Pearl  [15]. 

3  On  Decision  Support 

A  criticism  of  a  more  fundamental  nature  of  the  calculus  of  evidence  is  often  raised  regarding 
the  output  of  generalized  interval-probability  approaches.  Since  these  methods  often  fail, 
because  of  basic  knowledge  deficiencies,  to  rank  decision  choices  by  the  value  of  some  measure 
that  quantifies  the  desirability  of  each  choice  (e.g.,  expected  utility),  then  it  is  said  that  they 
lack  a  decision-theoretic  apparatus. 

Although  these  arguments  correctly  point  to  the  basic  knowledge  requirement  that  most 
decision  problems  entail — if  a  rational  clioice  is  to  be  made,  then  we  must  have  a  proper 
informational  basis  to  do  it—  this  obvious  consideration  is  twisted  to  argue  for  the  necessity 
to  estimate  unknown  probability  and  utility  values  when  they  are  not  available.  We  do  not 
think  that  this  pragmatic  necessity  argument  is  either  sound  or  compelling. 

In  our  view,  the  calculus  of  evidence  may  be  used  in  a  straightforward  fashion  to  produce 
intervals  of  possible  utility-values.  When  such  intervals  overlap  and  cannot  be  ordered,  this 
fact  simply  reflects  a  basic  deificiency  in  our  knowledge.  We  look  down  upon  “pragmatic 
justifications”  with  the  same  concern  that  any  experimental  scientist  must  show  about  pro¬ 
posals  to  guess  what  he  has  not  measured:  the  ability  to  make  decisions  in  the  absence  of 
knowledge  is,  in  our  view,  a  handicap  rather  than  an  advantage  of  any  method. 
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Far  from  lacking  a  decision- theoretic  methodology,  our  approach  provides  an  understand¬ 
able  quantification  of  the  undesirable  effects  that  poor  information  has  on  our  decision¬ 
making  ability,  ordering  decisions  whenever  it  is  rationally  possible  but  advising  us  that 
such  ranking  is  not  possible  if  our  knowledge  is  insufficient.  In  brief,  our  approach  not  only 
supports  decision-making  but,  through  its  built-in  sensitivity-analysis  features,  helps  us  to 
determine  what  must  be  done  to  reach  a  happier  epistemological  state.** 

4  On  Dempster's  Rule  of  Combination 

The  semantic  model  of  the  Dempster- Shafer  theory  also  validates  the  so-called  Dempster’s 
rule  of  combination,  which  permits  the  combination  of  belief  and  mass  functions  corre¬ 
sponding  to  different  evidential  observations,  made  under  certain  conditions  of  independence. 
When  such  conditions  are  not  valid,  use  of  this  formula  leads,  of  course,  to  erroneous  results, 
often,  although  incorrectly,  considered  to  be  an  essential  handicap  of  the  evidential  reasoning 
approach,  rather  than  a  consequence  of  its  misapplication. 

The  Dempster  formula  is,  currently,  the  principal  evidence  integration  mechanism  of  the 
belief-function  approach.  It  was  derived  in  the  context  of  a  basic  model  of  the  effect  of 
probabilistic  evidence  that  correctly  interprets  such  evidence  as  constraints  on  probability 
values  rather  than  as  the  source  of  the  actual  values,  which  are  typically  undetermined. 
It  may  be  described  as  an  expression  that,  under  certain  conditions  of  independence,  yields 
bounds  for  the  conditional  probability  distribution  P(-|^i,  if  2)  on  the  basis  of  similar  bounds 
for  the  probability  distributions  P(-|^i)  and  P(-|^2)' 

To  understand  the  conceptual  bases  for  the  Dempster’s  formula  of  combination  and  its 
consistence  with  conventional  probability,  we  resort  to  a  generalization  of  the  logical  model 
used  before  to  derive  the  basic  relations  of  the  calculus  of  evidence.  Instead  of  considering  a 
single  epistemic  operator,  corresponding  to  a  single  statistician  or  observer,  we  will  consider 
two  such  rational  agents,  with  their  knowledge  modeled  by  means  of  two  operators  Ki 
and  K2.  Each  of  these  rational  agents  will  be  assumed  to  be  ignorant  of  the  knowledge 
possesed  by  the  other,  i.e.,  as  if  they  were  statisticians  performing  independent  experiments 
under  different  evidential  conditions  ^ j  and  ^ 2-  Their  common  knowledge,  however,  will  be 
modeled  by  means  of  a  nonindexed  operator  K  corresponding  to  a  third  reliable  agent  that 
aggregates  the  statistical  knowledge  gathered  by  the  other  two. 

Clearly,  in  a  given  applicable  situation  (i.e.,  the  first  agent  observes  and  the  second 
agent  observes  ^2)7  the  integrating  agent,  who  does  not  add  any  knowledge  of  his  own,  will 
be  able  to  prove  (or  to  “know”  the  truth  of)  a  proposition  p,  if  the  other  agents  provide 
individual  items  of  information  that,  when  combined  (i.e.,  conjoined)  imply  p,  as  expressed 
by  the  basic  combination  axiom: 

*'For  an  example  of  an  approach  that  incorporates  decision-maker  preferences  into  the  framework  of  the 
belief-function  calculus,  the  reader  is  referred  to  a  recent  paper  by  Strat[21]. 
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Kp  is  true  if  and  only  if  there  exist  sentences  pi  and  p2  such  that  Kipi  and  K2P2  are 
true,  and  such  that  pi  Ap2  ^  P- 

Using  our  three  operators  to  generate  all  possible  (i.e.,  logically  consistent)  states  of 
knowledge  that  may  be  attained  by  each  of  the  three  agents  while  cissessing  the  state  of  a 
real  system,  we  may  say  that  each  of  them  has,  as  wcis  the  ccise  before,  knowledge  about  the 
real  world  that  may  be  represented  by  the  “most  specific”®  propositions  Pi,P2>  and  p  that 
each  has  been  able  to  prove  (with  p  being  obviously  more  specific  than  either  pi  or  P2).  In  the 
terminology  of  Ruspini’s  semantic  model,  each  of  the  agents  is  in  an  epistemic  state,  denoted 
by  e(p),  ei(pi)  and  e2(p2),  respectively,  each  corresponding  to  the  set  of  all  conceivable  states 
of  the  real  world  (i.e.,  possible  worlds)  having  such  knowledge  characteristics. 

The  following  important  set-equation  relating  all  of  these  types  of  epistemic  sets  as  sub¬ 
sets  of  our  enhanced  epistemic  universe  is  the  basis  for  the  derivation  of  various  evidential 
combination  formulcis, 

e(p)  =  U  {ei(pi)  ne2(p2)), 

PJ  Ap2=p 

of  which  the  Dempster  combination  formula, 

m{p)  =  K  »^i(Pi)  ^2(^2) , 

Pl  Ap2=p 

where 


m(p)  =  P(e(p)|g’i,  g’2),  mi(pi)  =  P(ei(pi)|g’i) ,  m2(p2)  =  P(e2(p2)  1^2) , 

and  where  k  is  a  multiplicative  factor,  is  the  best  known  and  used. 

Before  reviewing  the  actual  process  leading  to  the  derivation  of  the  Dempster’s  formula, 
it  is  important  to  pause  and  reflect  upon  the  nature  of  the  above  set-theoretic  equation  and 
its  usefulness  to  derive  evidence  combination  formulcis. 

We  may  first  note  that  this  equation  has  been  derived  as  a  relation  between  subsets  of 
possible  “epistemological  states”  that  is  valid  regardless  of  any  assumptions  about  proba¬ 
bilistic  structures  and  their  properties  (e.g.,  independence).  As  such,  it  provides  not  only  the 
bcises  for  the  derivation  of  the  Dempster  formula  but  actually  for  a  variety  of  formulcis  that 
bound  possible  probability  values  within  and  outside  the  structures  of  the  Dempster- Shafer 
theory. 

Basically,  this  formula  provides  the  basis  to  extend  a  probability  function  P  that  is  known 
over  subsets  of  the  form  ei(pi)  and  62(^2)  (i-e.,  over  two  a-algebras),  to  the  set  of  unions 
of  sets  of  the  form  ei(pj)  fl  62(^2)  (i-e.,  another  a-algebra).  If  such  extension  can  be  made 
uniquely — as  is  the  case  for  Dempster’s  formula — the  resulting  extension  may  be  used  to 
generate  both  the  conditional  probability  P(*|^i,  ^2)  ^ind  its  associated  bounds  Bel  and  PI, 

®Note  tliat  sucli  most-specific  knowledge  always  exists  and  is  unique  but  for  logical  equivalences,  since 
the  conjunction  of  all  proved  theorems  is  itself  a  theorem. 
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which  are  fully  compliant  with  Shafer’s  axioms.  In  other  less  fortunate  cases  (e.g.,  dependent 
evidence),  such  extension  is  not  unique,  and  the  lower  envelope  of  the  possible  extensions, 
which  is  not  a  probability,  will  lead  to  bounds  that  do  not  satisfy  the  axioms  of  the  calculus 
of  evidence. 

This  equation  is  now  being  used  to  extend  the  evidential  calculus  approach  by  general¬ 
ization  of  the  notion  of  conditional  probabihty  by  study  of  the  probabilistic  relations  that 
define  dependencies  between  the  different  types  of  epistemic  sets  (i.e.,  e(p),  ei(pi)  and 
e2(p2))-  Pearl  [15],  however,  believes,  apparently  as  the  result  of  his  examination  of  the  role 
of  compatibility  relations  in  the  calculus  of  evidence,  that  this  approach  is  essentially  limited 
in  its  expressive  ability  to  set-theoretic  relations  between  epistemic  sets,  which  correspond 
to  classical  logical  conditional  statements  (i.e.,  material  implications). 

In  fact,  it  may  be  easily  seen  from  our  epistemic  identity  that  whenever  the  conditional 
probabihties  P(e2(p2)|si(pi))  P(6i(pi)|e2(p2))  are  restricted  to  take  the  values  0  or  1,® 

this  identity  may  be  used  to  map  one  body  of  evidence  into  another,  i.e.,  by  means  of  the 
compatibility  relations  that  such  probabilities  define. 

Since  under  these  assumptions,  however,  there  can  be  only  one  proposition  p2  for  every 
proposition  pi  such  that  P(e2(p2)  |ei(pi))  =  1,  and  vice  versa,  then  the  compatibility  relation 
that  is  so  defined  may  be  characterized  by  several  implications  of  the  form 

ei(pi)  62  (P2)  , 


and  of  the  form 

62(92)  ^  ei(<7i) , 

between  knowledge  states  of  one  observer  and  knowledge  states  of  the  other  which  are  useful 
to  “transfer  mass”  between  propositions.  This  correspondence  must  be  contrasted  with  that 
following  from  the  limited  interpretation  given  by  Pearl,  who,  from  knowledge  of 

61  (pi)  ^  e2(p2) , 

concludes  (by  contraposition),  correctly  but  narrowly,  that 

->62(p2)  -'6i(pi) , 

and  proceeds  then  to  attach  all  material  implication  paradoxes  (e.g.,  the  “ravens  paradox”) 
to  the  calculus  of  evidence  as  if  they  were  an  essential  methodological  bane.  If  that  were 
to  be  the  Ccise — clearly  it  is  not —  the  same  concerns  should  be  raised  about  the  use  of 
conditionals  in  conventional  probability  calculus. 

The  second  observation  that  may  be  made  about  the  nature  of  evidence  combination,  in 
general,  and  the  role  of  our  basic  set  identity  to  generate  combination  formulas,  in  particular, 

®It  may  be  shown  from  the  definition  of  epistemic  sets  that,  under  such  conditions,  knowledge  of 
P(e2(P2)|ei(pi))  suffices  to  derive  P(Gi(pi)|G2(p2)). 
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is  that  while  the  functions  to  be  combined  are  conditional  probabilities  over  two  different 
evidential  sets  and  ^2  (i-e.,  the  evidence  observed  by  two  agents),  the  desired  integrated 
probability  is  a  distribution  over  ft ^2  (since  we  know  that  both  observations  are  correct). 
Except  for  unusual  cases,  however,  computation  of  P(*|^i,8’2)  entails  a  “normalization” 
operation  that  is  fully  consistent  with  the  calculus  of  probability.  Most  of  the  normalization 
“paradoxes”  are  the  result  of  misunderstanding  about  what  is  being  combined:  two  different 
conditional  probabilities  rather  than  two  different  lower  and  upper  bounds  of  the  same 
probability  function.^ 

Focusing  now  on  the  rationale  for  Dempster’s  formula,  we  should  notice  first  that  the 
epistemic  sets  ei(pi)  and  62(^2)  are  sudi  that 

ei(pi)  Q  ^  1,  62(^2)  Q  ^ 2  1 


i.e.,  the  possible  knowledge  states  of  each  statistician  include  awareness  of  the  truth  of  the 
evidence  that  is  observed  by  each.  Furthermore, 


^1  —  U  Gi(pi) ,  ^ 2  —  U  62(^2)  j 

Pi  P2 

where  and  p2=^^2;  i-e.,  each  statistician  knows  something  that  implies  that  his 

evidential  observation  is  true  (otherwise  he  would  not  be  “counting”  that  sample).® 

Assume  now  that  there  exists  a  probability  distribution  P  defined  over  the  space  of  all 
possible  epistemic  states  for  our  observing  statisticians  and  our  “integrating”  agent.  Each 
such  epistemic  state  is  a  possible  world  that  corresponds  to  a  possible  state  of  the  world  and 
to  a  possible  state  of  knowledge  for  each  agent  that,  in  addition,  is  consistent  with  the  laws 
of  logic.  We  will  assume  now  that,  whenever  pi=^^i  and  P2=^^2) 


P(ei(pi)  ne2(p2)) 


P(ei(pi))  P(e2(P2)),  if  Pi  A  P2  7^0, 
0 ,  otherwise. 


This  assumption  simply  states  that  when  and  S’2  are  both  true  the  probability  that  a 
rational  observer  will  be  in  a  particular  knowledge,  or  epistemic,  state  does  not  provide  any 
information  about  the  probability  of  the  epistemic  state  of  the  other  agent  (i.e.,  beyond  ruling 
out  logical  impossibilities).  In  purely  formal  terms,  we  may  say  that  knowledge  of  values  of 
P  over  sets  of  the  form  ei(pi)  does  not  provide  any  indication,  beyond  exclusion  of  logical 
impossibilities,  of  the  values  of  P  over  sets  of  the  form  e2(p2)  and  vice  versa.  The  epistemic 
states  of  our  two  agents  may  be  said,  therefore,  to  be  unrelated  in  that  knowledge  of  the 
state  of  one  of  our  observers  (by  our  integrating  agent)  does  not  provide  any  information 
about  the  state  of  the  other,  save  for  elimination  of  logical  impossibilities. 

^It  is  fair  to  say  that  much  of  the  skepticism  raised  by  the  normalization  used  in  Dempster’s  formula 
can  be  traced  to  the  exposition  given  by  Shafer  [18],  which  suggests  a  nonprobabilistic  method  of  evidence 
combination. 

^Recall  that  our  observers,  or  rational  agents,  are  statisticians  estimating  properties  of  certain  statistical 
distributions  by  classifying  each  sample  using  their  evidence  and  additional  sample-dependent  knowledge. 
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Noting  now  that 


P(e.(p,)|ir.) 


P(gi(Pi)) 

P(»’.) 


P(ej(P2)|8-2) 


P(e2(P2)) 

P(*f2) 


P(®l(pi)  ^  ^(P2}|^  11^2) 

then,  whenever  pi  A  p2  7^  0, 


P(ei(pi)  n  62(^2)) 

P{^,r)^2)  ’ 


P(ei(pi)  ne2(p2)|^  =  «P(ei(pO|^i)P(e2(p2)|8’2)  =  k  Tni{py)  ni2{p2)  , 

from  which  the  Dempster’s  formula  readily  follows. 

The  normalization  factor 

_  P(gOP(g2) 

P(g’an^2)  ’ 

has  been  the  object  of  considerable  concern  on  the  part  of  both  skeptics  and  proponents  of 
the  calculus  of  evidence.  The  above  expression,  however,  provides  the  rationale  for  its  usage 
while  disposing  of  arguments  about  its  alleged  inconsistence  with  the  probability  calculus. 
In  that  expression,  the  denominator  P(^i  PI  ^2)  appears  as  the  consequence  of  the  need  to 
derive  probability  distribution  estimates  with  respect  to  the  intersection  of  the  two  observed 
evidences  and  ^2-  The  numerator  of  that  expression  simply  reflects  the  need  to  combine 
conditional  distributions  over  the  same  reference  set  (i.e.,  the  epistemic  universe)  while  our 
probabilistic  knowledge  is  expressed  over  two  of  its  subsets  (i.e.,  and  ^2)- 

The  essence  of  the  conditions  that  lend  validity  to  the  Dempster  formula  may  be  summa¬ 
rized  by  saying  that  the  formula’s  usefulness  is  confined  to  the  limited,  but  rather  important, 
cases  where  estimates  of  probabilistic  likelihood  have  been  formulated  by  two  rational  agents 
on  the  bases  of  independent  observations,  while  ignoring  the  evidence  available  to  each  other. 

If  our  integrating  agent  is  thought  of  as  being  concerned  with  estimating  the  probabilities 
of  certain  events  when  both  ^1  and  ^2  are  true,  then  we  may  say  that,  whenever  the 
conditions  validating  the  Dempster’s  formula  hold,  knowledge  of  the  fact  that  a  particular 
sample  satisfies  pi  tells  the  agent  nothing  about  the  likelihood  of  p2  (unless,  of  course,  pi 
happens  to  be  logically  inconsistent  with  ^2)-  Furthermore,  whenever  our  integrating  agent  is 
done  with  his  job,  he  should  find  out  that  estimating  this  joint  distribution  (i.e.,  over  ^  1 0^2) 
could  have  been  accomplished  in  an  easier  fashion  by  estimating  the  marginal  distributions 
over  and  ^’2  and  deriving  the  joint  distribution  by  multiplication  and  normalization. 

Other  accounts  supporting  the  validity  of  Dempster’s  formula  and  its  consistence  with 
the  probability  calculus  have  been  advanced  by  several  authors.  A  particularly  compelling 
justification  has  been  recently  given  by  Wilson  [22]. 


11 


5  On  "Paradoxes" 


Criticisms  of  the  Dempster  formula  may  be  broadly  characterized  being  the  consequence 
of  ba^ic  misunderstandings  about  either  its  meaning  or  its  validity. 

In  this  section,  we  examine  three  alleged  paradoxes  of  the  theory,  showing  that  the  pur¬ 
ported  inconsistencies  are  actually  the  results  of  conceptual  misunderstandings  or  misrepre¬ 
sentations  of  the  position  of  those  who,  while  generally  supporting  the  calculus  of  evidence, 
are  concerned  with  its  possible  misapplication. 

5.1  The  "Three-Prisoner"  Problem 

Turning  our  attention  first  to  concerns  about  the  validity  of  the  Dempster’s  formula,  we  may 
note  that,  in  general,  such  examples  ignore  its  scope  of  applicability,  producing  counterin¬ 
tuitive  results  that  are  then  used  to  dismiss  the  methodology  as  inadequate.  Among  those, 
the  “three-prisoner”  problem  discussed  by  Diaconis  and  Zabell  [5]  hcis  been  perhaps  the  most 
quoted  and  discussed. 

This  problem  is  one  of  a  variety  of  examples,  in  which  the  combination  formula  is  used 
2is  a  conditioning  formula  by  aissuming  that  one  of  the  mass  distributions  being  combined 
simply  assigns  all  of  its  mass  to  a  proposition  p  in  the  frame  of  discernment.  Combination  of 
such  a  simple  support  function  with  another  mass  function  associated  with  a  belief  function 
Bel(-)  leads  to  the  conditioning  formula 

_  Bel(?  V  ^p)  -  Bel(^p) 

-  - 1  -  Bel(^p) - • 

In  the  particular  case  of  the  three-prisoner  problem,  concerned  with  the  guilt  or  innocence 
of  a  prisoner  that  has  been  chosen  (by  the  Warden)  as  the  guilty  party  by  random  draw  among 
three  candidates  Aj,  A2,  and  A3,  our  “logical  space”  or  frame  of  discernment  is  simply  the 
Boolean  algebra  induced  by  the  three  noncompatible  propositions 

“Prisoner  A,-  hais  been  found  guilty,” 

where  i  =  1,2,3.  Since  only  one  of  the  three  prisoners  is  chosen  by  the  Warden,  we  clearly 
have 

P(p0  =  3^  i  =  1,2,3. 

(Note  that  P  is  actually  a  cleissical,  additive,  probability  distribution). 

Prisoner  Ai  now  asks  the  Jailer  to  name  one  of  the  innocent  prisoners  (other  than  him) 
arguing  that  such  information  would  clearly  be  of  little  help  to  him  8ls  an  indicator  of  his 
potential  fate.  As  Pearl  notes,  if  q  stands  for  the  proposition  “The  Jailer  names  A2  a.s  one 
of  the  innocent,”  then  application  of  the  conditioning  rule  leads  to  the  result 


indicating  that  the  conditional  probability  P{pi  I?)  must  be  exactly  instead  of  the  “correct 
solution” 

0  <  P(Pi  k)  <  i  , 

while  also  saying,  against  the  correct  intuition  of  Ai  that  his  chances  of  guilt  have  been 
increased  as  the  result  of  the  irrelevant  information  provided  by  the  Jailer.  From  such 
an  observation,  Pearl  concludes  that  the  formula  is  seriously  flawed,  both  because  of  the 
counterintuitive  result  that  it  produces  and  for  its  “collapsing”  of  a  family  of  solutions  into 
a  single  value. 

Before  proceeding  to  the  discussion  of  Pearl’s  concerns,  we  may  note,  in  passing,  that 
this  problem  has  been  well  known  as  a  source  of  paradoxes  and  incorrect  solutions  within  the 
scope  of  the  conventional  probability  calculus  [2]  quite  independently  of  any  issues  of  validity 
of  its  treatment  using  the  Dempster-Shafer  calculus.  The  explanations  given  to  describe  the 
conceptual  errors  leading  to  incorrect  classical  treatments  resemble  to  some  extent  those  that 
shed  light  on  the  inapplicability  of  the  Dempster’s  formula. 

Returning  now  to  the  role  of  the  Dempster’s  formula  in  this  problem,  vve  may  first  observe 
that,  although,  at  first  glance,  the  distributions  representing  the  Jailer’s  and  Warden’s  choices 
seem  independent,  it  is  actually  impossible  for  the  Jailer  to  tell  to  Ai  that  A2  is  one  of  those 
to  be  spared  if  all  he  knew  was  that  the  Warden  was  choosing  the  guilty  party  by  random 
draw  (i.e.,  he  needs  to  know  exactly  who  is  the  one  chosen  for  punishment).  To  use  the 
terminology  of  Ruspini’s  model,  the  probability  of  A2  being  named  as  one  of  the  innocent 
depends  on  the  epistemic  state  of  the  Warden,  thus  violating  the  independence  assumptions 
of  the  Dempster’s  formula.  If  all  possible  combinations  of  truth  values  for  the  propositions 
Pi,  i  =  1,2,3,  and  q  are  tabulated,  together  with  their  probabilities,  as  is  done  in  Table  1, 
then  it  is  clear  that 

P(9|P3)  =  1,  P(9)  =  i(l+«), 

where  0  <  q  <  1  represents  the  unknown  probability  that  the  Jailer  will  choose  to  name  A2 
rather  than  A^^  as  innocent  if  Ai  is  actually  the  one  chosen  by  the  Warden  as  guilty. 


Possible  World 

Warden’s  Choice 

Jailer  Identifies 

Probability 

Wr 

■^1 

A2 

W2 

Ai 

A3 

3  (1  -") 

W3 

A2 

A3 

1 

3 

W4 

A3 

A2 

1 

3 

Table  1:  Possible  Worlds  in  the  Three-Prisoner  Problem 
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But  then, 


P(l|P3)  7^  P(9)  . 


violating  the  assumptions,  discussed  above,  that  validate  the  utilization  of  Dempster’s  for¬ 
mula  (i.e.  P(e2(p2)|ei(pi))  ^  P(e2(p2))-  There  is  not,  therefore,  “total  mystery,”  as  Pearl 
says,  as  to  the  incorrect  results  obtained  using  the  Dempster’s  formula.  Because  it  fails  to 
be  applicable,  there  should  be  little  wonder  that  it  leads  to  apparent  paradox. 

Although,  as  clearly  shown  by  this  discussion,  the  incorrect  treatment  of  the  three- 
prisoner  problem  fails  to  invahdate  the  Dempster’s  rule  of  combination,  we  share  the  concern 
of  Pearl  and  others  about  its  wide  misapplication,  particularly  when  it  is  used  indiscrimi¬ 
nately  to  generate  conditional  distributions.  In  our  research,  we  are  endeavoring  to  extend 
the  original  theory  to  produce  expressions  to  produce  and  utilize  conditional  belief  informa¬ 
tion  [16]  that  incorporates  known  dependencies  between  evidential  bodies.  These  formulas 
are  intended  to  provide  better  interval  estimates  than  the  typically  uninformative  bounds 
that  are  supplied  by  strict  derivation  of  bounds  in  the  absence  of  additional  information  by 
the  expression 

_  , .  I  .  Bel(p  A  q) 

^  Bel(p  Aq)  +  Pl(p  A  -9)  ’ 

which  is  mentioned  in  Dempster’s  original  paper  [4]  and  that  has  been  the  object  of  recent 
concern  by  several  authors  [3,7]. 

In  closing,  we  believe  it  is  important  to  address  other  concerns  of  Pearl,  apparently  going 
beyond  the  three-prisoner  problem,  about  the  counterintuitive  nature  of  the  “collapse”  that 
usage  of  the  Dempster  formula  often  produces,  which  is  manifested  by  production  of  a  single 
conditional  probability  distribution  when  conditioning  multiple  members  of  a  family  V  of 
probabilities  over  some  specific  subset  q.  Just  as  it  is  true  that  all  members  of  the  family  of 
distributions 

P={P,:tin[0,l]} 

defined  in  the  set  X  =  {a,  6,  c}  by  the  expression 

,  if  X  =  a, 

1(1 -i),  ifx  =  6, 

I ,  if  a:  =  c 

are  such  that  Pj  ({a,  6})  =  ^  ,  despite  their  variability  over  other  subsets,  it  is  also  true  that 
an  extensive  family  of  distributions  may  collapse  into  a  single  conditional  probability  without 
violating  any  rational  or  probabilistic  principles.  Such  “invariants”  are,  in  fact,  desirable  as 
elements  that  simplify  the  analysis  of  an  otherwise  complex  probabilistic  problem.  For  these 
reeisons,  we  believe  that,  if  the  Dempster’s  conditioning  formula  is  applicable,  its  reduction 
of  the  variability  of  probability  values  should  not  be  a  particular  cause  for  concern  as  to  its 
validity. 
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5.2  The  Spoiled  Sandwich 

While  discussing  the  suitabiUty  of  the  calculus  of  evidence  either  as  a  form  of  generalized 
probabilistic  calculus  or  as  a  new  theory  that  intends  to  capture  a  novel  notion  of  belief, 
Pearl  [15]  again  faults  the  approach  for  failing  to  satisfy  the  following  rationality  principle 
originally  stated  by  Aleilunas  [Ij: 

“If  two  diametrically  opposed  assumptions  yield  two  different  degrees  of  belief  in  a 
proposition  Q,  then  the  unconditional  degree  of  belief  merited  by  Q  should  be  some¬ 
where  between  the  two.” 

As  natural  as  such  a  principle  might  look  at  first,  the  following  simple  and  clever  example 
from  Wilson  [23]  clearly  shows  that  it  is  neither  intuitive  nor  appealing  but  points,  instead, 
to  the  pitfalls  of  creating  or  supporting  one’s  favorite  scheme  on  the  strength  of  supposedly 
rational  axioms. 

Let  X  =  {a,  6,  c,  d}  with  A  =  {o,  ^'}  and  B  =  {ojc},  so  that  B  ~  {6,  d}.  Consider  the 
family  of  probability  distributions  in  X 

V={'Pc.t\n  [0,1]}, 

indexed  by  a  parameter  t  in  [0, 1]  and  defined  by 


Pt({a})  = 

Pt(W)  =  1(1 -i), 

Pt({c})  = 

P.(W)  =  ^ 

and  let 

P.  =inf{P,}  . 

Then,  clearly, 

p,  (A)  =  ii  +  i(i-i)  =  ^ 

and,  therefore,  P,  (A)  =  The  conditional  probabilities  Pt  {A\B)  and  Pt  {A\B)  are  given 
by  the  expressions 


P,  (AIB) 
Pt  (^1^) 

from  which  the  lower  bounds 


Pt(W)  ..  it 
Pt({«,c})  f  +  if’ 
Pt({6})  Hl-^) 


P,(A15)  =  lnf  Pt  {A\B)  =  0, 
P,  (A|5)  =  inf  Pt  (A|5)  =  0, 
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are  easily  derived.  It  is  clear,  however,  that 

i  =  P.  (A)  >  P.  {A\B)  =  P.  {A\B)  =  0 , 

showing  that  the  the  sandwich  principle  is  violated  even  within  the  confines  of  conventional 
probability  theory. 

5.3  Other  Ways  to  Spoil  the  Sandwich 

Although  such  simple  examples  should  suffice  to  dispose  of  concerns  about  spoiled  sand¬ 
wiches,  we  feel  that  Pearl’s  discussion  of  the  problem  deserves  a  more  detailed  analysis, 
mainly  because  of  its  philosophical  impfications  to  rational  thinking.  This  is  particularly 
important  because  loose  use  of  such  terms  as  “assured  winnings,”  “support,”  or  “belief”  in 
the  absence  of  a  sound,  formal  interpretive  framework  may  quickly  mislead  those  engaged  in 
the  comparison  of  alternative  methodologies. 

In  an  example,  called  “the  Peter,  Paul,  and  Mary  Sandwich  problem,”  Pearl  presents  a 
betting  situation  in  which  Mary  prepares  either  a  ham  or  a  turkey  sandwich,  promising  to 
pay  Paul  $1000  should  he  guess  correctly  the  type  of  sandwich  that  she  has  prepared.  Not 
having  a  clue  as  to  Mary’s  choice,  Paul  then  flips  a  coin,  guessing  “ham”  if  the  coin  turns 
up  heads  and  guessing  “turkey”  if  it  comes  up  tails.  Paul,  as  Pearl  notes,  behaves  like  an 
“incurable  Bayesian,”  reckoning  that 

P(win)  =  P(win  |  turkey)  P(turkey)  -H  P(win  |  ham)  P(ham) 

=  P (tails  I  turkey)  a  -f  P (heads  |  ham)  (1  —  a)  =  | , 

regardless  of  the  value  a  of  the  probability  that  Mary  has  actually  prepared  a  turkey  sand¬ 
wich.  Thus,  in  spite  of  not  being  “assured”  a  win  or  having  “supporting  evidence,”  Paul  can 
invoke  the  rationality  (doubtful,  as  we  already  saw)  of  the  sandwich  principle  and  argue  that 
he  does  not  need  to  engage  in  unnecessary  knowledge  acquisition  or  experimentation  [15]: 

“If  every  possible  outcome  of  an  experiment  would  lead  you  to  choose  the  same  action, 
then  you  ought  to  choose  that  action  without  running  the  experiment.” 

From  such  an  observation.  Pearl  proceeds  to  fault  the  philosophical  underpinnings  of  the 
evidential  reasoning  approach,  eventually  going  as  far  as  to  suggest  that,  should  Bayesian 
orthodoxy  be  unapplicable,  the  Dempster’s  formula — which,  he  freely  admits,  does  not  play 
any  role  in  this  example — be  replaced  by  other  formulas  such  as  the  well-known  bounds 
recently  rediscovered  by  Halpern  and  Fagin  [7]. 

In  the  light  of  our  previous  example  about  the  rather  inconvenient  ability  of  conven¬ 
tional  probability  families  to  spoil  sandwiches,  all  of  these  pronouncements  look  increasingly 
suspicious:  What,  however,  may  we  say  is  wrong?  This  question  may  be  answered  in  two 
equivalent  ways. 
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We  may  say  first,  keeping  ourselves  at  the  informal  discussion  level,  that,  often,  the 
experiments  may  interact  with  probabilities  in  complex  ways  that,  obviously,  Pearl  heis  not 
considered.  Nothing  in  Pearl’s  formalism  suggests,  for  example,  that  the  sandwich  has 
already  been  prepared  and  that  it  may  not  be  artfully  substituted  by  Mary  to  assure  that 
Paul  always  loses,  thus  invalidating  his  hopes  of  having  at  least  a  50  percent  chance  of 
winning. 

The  second,  more  formal,  rendering  of  this  observation  is  again  based  on  the  semantic 
model  of  Ruspini.  In  this,  and  in  other  similar  problems,  we  have  several  agents  that  de¬ 
liberate  about  the  state  of  the  world  on  the  basis  of  their  knowledge  and  knowledge  of  the 
knowledge  of  others.  If  the  unary  operator  K  represents  the  state  of  knowledge  of  one  of 
these  agents,  then,  as  observed  before,  our  agent  is  always  in  one  of  three  possible  epistemo¬ 
logical  states  with  respect  to  the  validity  of  a  proposition  p:  either  he  knows  that  p  is  true 
(denoted  Kp),  or  he  knows  that  p  is  false  (denoted  K-ip),  or  he  may  be  ignorant  of  such 
truth  (i.e.,  -iKp  A  -iK-ip,  denoted  I?). 

In  standard  accounts,  assuming  that  knowledge  of  the  truth  of  one  proposition  does  not 
affect  the  likelihood  of  truth  of  other  propositions,®  we  are  simply  concerned  with  a  single 
form  of  conditional  probability:  that  measuring  the  likelihood  of  p  being  true  when  q  is 
true.  In  more  complex  epistemological  situations,  we  may  need  to  be  concerned  with  such 
quantities  as  P(Kp  |  Kqf),  P(Kp  |  q),  P(Kp  |  I5),  and  the  like.  In  other  words,  Bel(p  |  q) 
measures  the  support  that  knowledge  of  the  truth  of  q  provides  to  the  truth  of  p,  rather  than 
the  support  provided  by  the  truth  of  q  to  the  truth  of  p. 

In  the  Peter,  Paul,  and  Mary  sandwich  problem.  Pearl  implicitly  assumes  that 


P(KMARYbeads)  =  0, 
P(KMARYtails)  =  0, 

P (turkey  I  iMARYheads)  =  a, 
P(ham  [  iHARYkeads)  =  1  —  o: , 


concluding  correctly,  by  application  of  the  total  probability  law,  over  the  exhaustive  and 
exclusive  set  of  possibilities 

{KHARYkeads,  KHARYtails,  iHARYheads} , 

that  Paul  has  at  least  a  50  percent  chance  of  winning. 

This  correct  use  of  the  total  probability  law  does  not  mean  that,  by  contrast,  one  should 
assume  that  the  full  extent  of  the  conditional  information  provided  by  belief  functions  is 
limited  to  the  conditional  support  functions 

Bel  (p  I  ?)  =  P(p  I  Kg) ,  Bel  (p  \  -.g)  =  P(p  |  K-^g) , 

®The  relations  between  knowledge  and  truth  are  more  evident  if  “knowing”  is  thought  of  as  sensing  or 
observing,  and  if  independence  is  understood  as  a  lack  of  relationship  between  the  errors  of  the  sensors. 
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as  Pearl  evidently  does.  In  short,  not  knowing  p  is  not  the  same  as  knowing  -ip.  The  example 
of  the  Peter,  Paul,  and  Mary  sandwich  shows  that  one  needs  to  consider  states  of  ignorance 
that,  when  properly  accounted  for,  spoil  even  the  best-conceived  principles  of  rationality. 

To  fully  appreciate  the  complexity  of  the  problem,  suppose  that  we  change  Pearl’s  imphcit 
assumptions,  bringing  the  previously  absent  Peter  into  the  scene  as  a  spy  acting  on  behalf 
of  Mary.  In  this  new  scenario,  stiU  consistent  with  Pearl’s  explicit  statement  of  the  problem, 
Peter,  spying  on  Paul’s  coin  flipping  experiment,  alerts  Mary,  who,  being  rather  artful  and 
deft  of  hand,  substitutes  the  sandwich  so  as  to  make  sure  that  Paul  always  loses.  In  this 
case, 

P(ham  I  KnARYtails)  =  1 ,  P (turkey  |  Kmary heads)  =  1 ; 
and,  most  importantly, 

P  ^(KMARvbeads)  U  (KnARytails)^  =  1  , 

i.e.,  Mary  is  never  ignorant  as  to  what  Paul  will  bet. 

The  Peter,  Paul,  and  Mary  sandwich  example  does  not,  in  our  view,  invalidate  the 
applicability  of  the  evidential  approach,  but  rather  highlights  the  need  to  make  necessary 
discriminations  between  propositional  truth,  knowledge  of  that  truth,  and  the  interplay 
between  such  conditions  that  are  likely  to  be  glossed  over  by  cursory  analyses  based  on 
conventional  approaches. 

5.4  The  Disagreeing  Experts 

Another  common  misunderstanding  regarding  the  role  of  Dempster’s  combination  formula 
is  that  provoked  by  an  example  of  Zadeh  [24],  which  is  often  described  as  an  indication  of 
theoretical  inadequacy. 

This  example  concerns  two  reliable  experts  that  assess,  in  a  rather  conflicting  fashion,  the 
likelihood  of  three,  noncompatible,  events  A,  B,  and  C  as  shown  in  Table  2.  Representation 
of  each  of  the  expert’s  assessments  as  a  mass  distribution  followed  by  their  combination  with 
the  Dempster’s  rule  yields  P(5)  =  1,  indicating  that  the  “true”  event  is  B,  an  alternative 
considered  to  be  rather  unlikely  by  either  of  the  assessors. 


Observer 

P(A) 

P(B) 

P(C) 

1 

0.99 

0.01 

0 

2 

0 

0.01 

0.99 

Table  2:  Experts  Disagree  on  the  State  of  the  World 
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Although  this  example  is  often  quoted  as  an  example  of  the  failure  of  the  Dempster’s  rule, 
it  is  clear  that  each  of  the  rows  in  Table  2  defines  a  conventional  probability  distribution,  thus 
suggesting  that  the  problem  is  likely  to  lie  elsewhere.  While  one  may  be  tempted  to  defend 
any  method  of  evidence  combination  hy  saying  that  the  evidence,  however  peculiar,  indicates 
that  Observer  1  is  ruling  out  alternative  C  while  Observer  2  is  excluding  alternative  A,  thus 
leaving  only  B  as  the  sole  possible  answer,  it  is  clear,  upon  further  examination,  that  the 
rows  of  Table  2  cannot  possibly  be  evaluations  of  the  same  probability  distribution.  If  that 
were  the  case,  then  at  least  one  of  the  experts  must  be  wrong,  since  there  can  only  be  one 
correct  probability  distribution,  contradicting  the  assumption  that  they  are  both  reliable. 

Clearly,  if  the  example  is  to  make  any  sense  — under  any  type  of  probabilistic  interpretation — 
each  row  must  correspond  to  a  different  conditional  probability  where  the  conditions  corre¬ 
spond  to  different  observations  available  to  each  expert.  A  simple  example,  suggested  by  a 
recent  example  used  by  Kyburg  [9]  to  address  other  probabilistic  reasoning  issues,  will  help 
to  clarify  matters. 

In  this  example  we  are  being  asked  to  reason,  on  the  basis  of  available  evidence,  about 
the  taste  and  edibility  of  certain  berries  that  may  be  either  small  or  large;  and  red  or  blue; 
have  good  or  bad  taste;  or  be  safe  or  poisonous  to  eat.  We  will  assume  that  the  berries  in 
question  are  distributed  according  to  the  distribution  shown  in  Table  3. 


Color 

Size 

Taste/Edibility 

Probability 

Red 

Small, 

Good/Edible 

99/199 

Blue 

Large 

Bad/Edible 

99/199 

Red 

Large 

Poisonous 

1/199 

Table  3:  The  Berries  Probability  Distribution 


If  now  a  berry  is  picked  up  and  found  by  an  expert  to  be  large,  he  will  correctly  conclude 
from  such  evidence  that 

P(Good|Large)  =  0  ,  P (Poisonous |Large)  =  0.01 ,  P(Bad  Taste|Large)  =  0.99  . 

Another  expert,  noticing  that  the  berry  is  red,  will  conclude,  on  the  other  hand,  that 

P(Good|Red)  =  0.99  ,  P (Poisonous |Red)  =  0.01 ,  P(Bad  Taste|Large)  =  0 . 

Clearly  the  evidential  implications  of  these  two  separate  observations  are  identical  to  the 
situation  summarized  in  Table  2.  Examination  of  Table  3,  however,  reveals  that 

P(Poisonous|Red,  Large)  =  1 , 
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a  correct  solution  that  must  be  rationally  expected  from  any  reasoning  method  that  purports 
to  be  valid. 

The  solution  to  the  puzzle  of  the  disagreeing  experts  lies  on  recognizing  that  there  is, 
in  fact,  no  disparity  of  opinion  among  them.  Each  is  providing  quantitative  measures  of 
likelihood  with  respect  to  different  reference  classes.  The  Dempster  formula,  should  never 
be  applied  to  pool  partial  information  about  the  same  probability  distribution.  Furthermore, 
as  shown  by  a  sensitivity  analysis  of  the  results  of  its  application  to  the  berries  example, 
its  usage  in  situations  where  there  is  considerable  disparity  between  reference  classes  (as 
suggested  by  the  large  normalization  factor)  should  be  discouraged  on  the  basis  of  practical 
rather  than  conceptual  considerations. 

6  On  Complexity  and  Generality 

The  potential  complexity  of  the  belief-function  approach  to  represent  and  manipulate  interval 
constraints  on  a  family  of  probability  distributions  has  been  often  mentioned  as  a  handicap 
of  the  evidential  reasoning  methodology.  In  spite  of  such  misgivings,  two  major  empirical 
observations  have  indicated  that  the  approach  is  applicable  to  a  wide  variety  of  practical 
problems. 

First,  our  experience  shows  that,  notwithstanding  criticisms  based  on  unrealistic  worst- 
case  scenarios,  the  approach  is  computationally  efficient.  In  particular,  we  have  found  that 
representation  of  belief  functions  in  terms  of  mass  functions  results  in  a  storage  and  ma¬ 
nipulation  scheme  that  is  both  economical  and  easy  to  understand.  In  addition,  we  have 
sucessfully  implemented  tools,  such  as  summarization  and  coarsening  operators,  which  may 
be  effectively  utilized  to  limit  representational  complexity. 

Second,  our  current  functional  operators  have  been  chosen  to  guarantee  that  the  ma¬ 
nipulation  of  evidential  knowledge  results  also  in  knowledge  that  may  be  represented  in  the 
evidential  framework  (i.e.,  the  operators  are  closed). 

The  lack  of  generality  of  the  belief- function  approach  to  represent  general  lower-upper 
probability  constraints  is  well  known  [10].  Our  reliance  on  the  methodology  is  primarily 
the  result  of  practical  considerations:  although  we  would  prefer  to  manipulate  more  general 
constraints  on  probability  values,  compelling  computational  efficiency  arguments  force  us  to 
limit  the  scope  of  the  problems  considered  to  those  capable  of  being  at  least  approximately 
solved  by  a  belief-function  treatment. 

Being,  in  general,  partial  toward  interpretations  of  evidential  structures  that  are  fully 
compatible  with  probability  theory,  our  current  research  is  being  directed  toward  the  devel¬ 
opment  of  more  general,  yet  efficient,  representation  and  manipulation  methods. 

Our  current  concerns  with  the  manipulation  of  conditional  and  dependent  evidence  (i.e., 
the  evidential  counterpart  of  conditional  probabilities)  show,  for  example,  that,  for  some 
important  problems,  the  results  of  evidential  combination  fall  outside  the  scope  of  its  repre¬ 
sentational  capabilities.  In  our  experience,  these  methodological  limitations  are  more  worri- 
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some  than  any  of  the  supposedly  paradoxical  results  arising  from  its  misuse  or  its  claimed 
lack  of  a  decision-making  apparatus. 

Preliminary  results  [16]  indicate,  on  the  other  hand,  that  the  belief-function  approach 
may  be  used  to  approximate  the  results  of  these  evidential  combination  operations  and 
that  extended  representation  mechanisms  [20]  may  yet  be  developed  to  treat  more  general 
evidential  problems.  This  research  also  shows  the  basic  errors  inherent  in  criticisms  that 
regard  the  belief-function  approach  as  a  fully  developed  methodology  incapable  of  sustaining 
further  enhancement  and  modification.  Because  it  has  been  studied  in  depth  for  only  15 
years,  its  technological  status  is  that  of  a  young  discipline,  being  both  capable  of  enhancement 
on  its  own  and  of  combination  with  other  approaches  to  produce  more  general  tools  for 
probabilistic  reasoning.  Far  from  proving  that  we  have  reached  a  technological  plateau,  our 
investigations  indicate  that  much  is  yet  to  be  gained  from  such  a  development  and  integration 
process. 
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